Support tokenizer override per model for multi-model Triton + vLLM serving with OpenAI-Compatible #8321

JunmooByun · 2025-07-31T05:08:17Z

What does the PR do?

Support per-model tokenizer override when using Triton + vLLM in OpenAI-compatible mode.

This PR introduces HF_MODEL_NAME_MAP to associate custom model names with their corresponding Hugging Face model identifiers. During model registration, if a mapping is found, the tokenizer is loaded accordingly; otherwise, the system falls back to the default tokenizer.

This enables true multi-model serving in scenarios where each model may require a different tokenizer — something not possible with the previous global --tokenizer option.

Checklist

Commit Type:

feat

Related PRs:

Where should the reviewer start?

python/openai/openai_frontend/engine/triton_engine.py: Tokenizer override logic introduced here.

Test plan:

Ran frontend with:

python3 openai_frontend/main.py --model-repository tests/vllm_models

JunmooByun · 2025-08-04T01:32:13Z

This PR was created from a forked repository.

The branch has been updated to the latest main.
Currently, workflow approval and a code review are required.

Could you please:

Approve and run the workflows
Review and approve the PR

Thanks for your time!

JunmooByun added 2 commits July 31, 2025 13:20

Add tokenizer override logic for mapped HF model names

cde2484

Merge branch 'main' into feat/tokenizer-override

36d7b43

JunmooByun marked this pull request as draft August 4, 2025 01:19

JunmooByun marked this pull request as ready for review August 4, 2025 01:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support tokenizer override per model for multi-model Triton + vLLM serving with OpenAI-Compatible #8321

Support tokenizer override per model for multi-model Triton + vLLM serving with OpenAI-Compatible #8321

Uh oh!

JunmooByun commented Jul 31, 2025

Uh oh!

JunmooByun commented Aug 4, 2025

Uh oh!

Uh oh!

Support tokenizer override per model for multi-model Triton + vLLM serving with OpenAI-Compatible #8321

Are you sure you want to change the base?

Support tokenizer override per model for multi-model Triton + vLLM serving with OpenAI-Compatible #8321

Uh oh!

Conversation

JunmooByun commented Jul 31, 2025

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Uh oh!

JunmooByun commented Aug 4, 2025

Uh oh!

Uh oh!